Online Apprenticeship Learning

نویسندگان

چکیده

In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. Instead, observe trajectories sampled by an expert that acts according some policy. The goal is find policy matches expert's performance on predefined set of functions. We introduce online variant AL (Online Learning; OAL), where agent expected perform comparably while interacting with environment. show OAL problem can be effectively solved combining two mirror descent based no-regret algorithms: one for optimization and another learning worst case cost. By employing optimistic exploration, derive convergent algorithm O(sqrt(K)) regret, K number interactions MDP, additional linear error term depends amount available. Importantly, our avoids need solve MDP at each iteration, making it more practical compared prior methods. Finally, implement deep which shares similarities GAIL, but discriminator replaced costs learned OAL. Our simulations suggest performs well in high dimensional control problems.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrapping Apprenticeship Learning

•We consider the problem of imitation learning where the examples, given by an expert, cover only a small part of a large state space. • Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the partial demonstration, based on the assumption that the expert is maximizing an unknown utility function. • IRL consists in learning a reward function that explains the expert...

متن کامل

Structured Apprenticeship Learning

We propose a graph-based algorithm for apprenticeship learning when the reward features are noisy. Previous apprenticeship learning techniques learn a reward function by using only local state features. This can be a limitation in practice, as often some features are misspecified or subject to measurement noise. Our graphical framework, inspired from the work on Markov Random Fields, allows to ...

متن کامل

Semi-Supervised Apprenticeship Learning

In apprenticeship learning we aim to learn a good policy by observing the behavior of an expert or a set of experts. In particular, we consider the case where the expert acts so as to maximize an unknown reward function defined as a linear combination of a set of state features. In this paper, we consider the setting where we observe many sample trajectories (i.e., sequences of states) but only...

متن کامل

Safety-Aware Apprenticeship Learning

Apprenticeship learning (AL) is a class of “learning from demonstrations” techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert’s demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We conside...

متن کامل

Whatever happened to apprenticeship learning?

BACKGROUND I have been a clinical tutor for 10 years in Worthing Hospital, UK. During this time I have seen an increased emphasis on classroom teaching, assessments in controlled situations and simulation, rather than on apprenticeship learning during well-supervised clinical working. CONTEXT At the educational conference on 'Learning without Leaving the Workplace' hosted by my hospital, I ha...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i8.20798